The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The interaction and dimension of points are two important axes in designing point operators to serve hierarchical 3D models. Yet, these two axes are heterogeneous and challenging to fully explore. Existing works craft point operator under a single axis and reuse the crafted operator in all parts of 3D models. This overlooks the opportunity to better combine point interactions and dimensions by exploiting varying geometry/density of 3D point clouds. In this work, we establish PIDS, a novel paradigm to jointly explore point interactions and point dimensions to serve semantic segmentation on point cloud data. We establish a large search space to jointly consider versatile point interactions and point dimensions. This supports point operators with various geometry/density considerations. The enlarged search space with heterogeneous search components calls for a better ranking of candidate models. To achieve this, we improve the search space exploration by leveraging predictor-based Neural Architecture Search (NAS), and enhance the quality of prediction by assigning unique encoding to heterogeneous search components based on their priors. We thoroughly evaluate the networks crafted by PIDS on two semantic segmentation benchmarks, showing ~1% mIOU improvement on SemanticKITTI and S3DIS over state-of-the-art 3D models.
translated by 谷歌翻译
对抗性训练(AT)已被证明是将强大的对抗性鲁棒性引入深层神经网络的有效方法。但是,AT的高计算成本禁止在Federated Learning(FL)应用程序中使用有限的计算能力和较小的记忆足迹,例如,在资源受限的边缘设备上部署大规模的AT。以前很少有研究试图同时解决这些限制。在本文中,我们提出了一个名为Federated对抗性解耦学习(vade)的新框架,以启用FL中的资源受限的边缘设备。淡入淡出通过将解耦贪婪学习(DGL)应用于联合的对抗训练来减少计算和内存使用量,以便每个客户在每个通信回合中只需要在整个模型的一个小模块上执行。此外,我们通过添加辅助重量衰减来减轻客观不一致并实现更好的性能来改善香草DGL。 Fade为对抗性鲁棒性和融合提供了理论保证。实验结果还表明,淡出可以显着减少与完全关节训练保持几乎相同的准确性和鲁棒性的同时消耗的计算资源。
translated by 谷歌翻译
Federated Learning (FL) is pervasive in privacy-focused IoT environments since it enables avoiding privacy leakage by training models with gradients instead of data. Recent works show the uploaded gradients can be employed to reconstruct data, i.e., gradient leakage attacks, and several defenses are designed to alleviate the risk by tweaking the gradients. However, these defenses exhibit weak resilience against threatening attacks, as the effectiveness builds upon the unrealistic assumptions that deep neural networks are simplified as linear models. In this paper, without such unrealistic assumptions, we present a novel defense, called Refiner, instead of perturbing gradients, which refines ground-truth data to craft robust data that yields sufficient utility but with the least amount of privacy information, and then the gradients of robust data are uploaded. To craft robust data, Refiner promotes the gradients of critical parameters associated with robust data to close ground-truth ones while leaving the gradients of trivial parameters to safeguard privacy. Moreover, to exploit the gradients of trivial parameters, Refiner utilizes a well-designed evaluation network to steer robust data far away from ground-truth data, thereby alleviating privacy leakage risk. Extensive experiments across multiple benchmark datasets demonstrate the superior defense effectiveness of Refiner at defending against state-of-the-art threats.
translated by 谷歌翻译
It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it. However, current image captioning methods usually consider the generation of all words in a sentence sequentially and equally. In this paper, we propose an uncertainty-aware image captioning framework, which parallelly and iteratively operates insertion of discontinuous candidate words between existing words from easy to difficult until converged. We hypothesize that high-uncertainty words in a sentence need more prior information to make a correct decision and should be produced at a later stage. The resulting non-autoregressive hierarchy makes the caption generation explainable and intuitive. Specifically, we utilize an image-conditioned bag-of-word model to measure the word uncertainty and apply a dynamic programming algorithm to construct the training pairs. During inference, we devise an uncertainty-adaptive parallel beam search technique that yields an empirically logarithmic time complexity. Extensive experiments on the MS COCO benchmark reveal that our approach outperforms the strong baseline and related methods on both captioning quality as well as decoding speed.
translated by 谷歌翻译
Outcome prediction is crucial for head and neck cancer patients as it can provide prognostic information for early treatment planning. Radiomics methods have been widely used for outcome prediction from medical images. However, these methods are limited by their reliance on intractable manual segmentation of tumor regions. Recently, deep learning methods have been proposed to perform end-to-end outcome prediction so as to remove the reliance on manual segmentation. Unfortunately, without segmentation masks, these methods will take the whole image as input, such that makes them difficult to focus on tumor regions and potentially unable to fully leverage the prognostic information within the tumor regions. In this study, we propose a radiomics-enhanced deep multi-task framework for outcome prediction from PET/CT images, in the context of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022). In our framework, our novelty is to incorporate radiomics as an enhancement to our recently proposed Deep Multi-task Survival model (DeepMTS). The DeepMTS jointly learns to predict the survival risk scores of patients and the segmentation masks of tumor regions. Radiomics features are extracted from the predicted tumor regions and combined with the predicted survival risk scores for final outcome prediction, through which the prognostic information in tumor regions can be further leveraged. Our method achieved a C-index of 0.681 on the testing set, placing the 2nd on the leaderboard with only 0.00068 lower in C-index than the 1st place.
translated by 谷歌翻译
Recently, vector quantized autoregressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple generative process surprisingly works well, is this the best way to generate the image? For instance, human creation is more inclined to the outline-to-fine of an image, while VQ-AR models themselves do not consider any relative importance of each component. In this paper, we present a progressive denoising model for high-fidelity text-to-image image generation. The proposed method takes effect by creating new image tokens from coarse to fine based on the existing context in a parallel manner and this procedure is recursively applied until an image sequence is completed. The resulting coarse-to-fine hierarchy makes the image generation process intuitive and interpretable. Extensive experiments demonstrate that the progressive model produces significantly better results when compared with the previous VQ-AR method in FID score across a wide variety of categories and aspects. Moreover, the text-to-image generation time of traditional AR increases linearly with the output image resolution and hence is quite time-consuming even for normal-size images. In contrast, our approach allows achieving a better trade-off between generation quality and speed.
translated by 谷歌翻译
为了构建推荐系统,不仅考虑用户 - 项目交互表示为序数变量,而且还利用了描述用户之间关系的社交网络,我们开发了一个层次结构的贝叶斯模型称为序数图因子分析(OGFA),该模型共同对用户建模 - 项目和用户 - 用户交互。 OGFA不仅可以实现良好的建议性能,而且还提取与代表性用户偏好相对应的可解释潜在因素。我们进一步将OGFA扩展到Oldinal Graph Gamma信念网络,该网络是一个多策略层的深层概率模型,可在多个语义级别捕获用户的偏好和社交社区。为了有效的推断,我们开发了一种并行的混合吉布斯 - EM算法,该算法利用了图的稀疏性,可扩展到大数据集。我们的实验结果表明,所提出的模型不仅在具有明确或隐式反馈的推荐数据集上的最新基准,而且还提供了可解释的潜在表示。
translated by 谷歌翻译
人类运动建模对于许多现代图形应用非常重要,这些应用通常需要专业技能。为了消除外行的技能障碍,最近的运动生成方法可以直接产生以自然语言为条件的人类动作。但是,通过各种文本输入,实现多样化和细粒度的运动产生,仍然具有挑战性。为了解决这个问题,我们提出了MotionDiffuse,这是第一个基于基于文本模型的基于文本驱动的运动生成框架,该框架证明了现有方法的几种期望属性。 1)概率映射。 MotionDiffuse不是确定性的语言映射,而是通过一系列注入变化的步骤生成动作。 2)现实的综合。 MotionDiffuse在建模复杂的数据分布和生成生动的运动序列方面表现出色。 3)多级操作。 Motion-Diffuse响应有关身体部位的细粒度指示,以及随时间变化的文本提示,任意长度运动合成。我们的实验表明,Motion-Diffuse通过说服文本驱动运动产生和动作条件运动的运动来优于现有的SOTA方法。定性分析进一步证明了MotionDiffuse对全面运动产生的可控性。主页:https://mingyuan-zhang.github.io/projects/motiondiffuse.html
translated by 谷歌翻译
具有数百万参数的过度参数化模型取得了巨大成功。在这项工作中,我们问:至少由于学习者的\ emph {计算}限制,对大型模型的需求至少可以部分原因吗?此外,我们问,这种情况是否加剧了\ emph {robust}学习?我们证明确实可能是这种情况。我们展示了与信息理论学习者所需的学习任务相比,计算有限的学习者需要\ emph {明显更多的模型参数。此外,我们表明,对于健壮的学习可能需要更多的模型参数。特别是,对于计算有限的学习者,我们扩展了Bubeck and Sellke [Neurips'2021]的最新结果,该结果表明,强大的模型可能需要更多的参数,并表明有限学习者可能需要更多的参数数量。然后,我们解决以下相关的问题:我们是否希望通过限制\ emph {fersversaries}来纠正强大计算界限学习的情况,以便为了获得更少的参数获得模型而在计算上进行计算?再次,我们证明这是可能的。具体而言,在Garg,Jha,Mahloujifar和Mahmoody [Alt'2020]的基础上,我们演示了一项学习任务,可以有效,强大地对计算界限的攻击者进行有效,强大的学习,同时对信息理论攻击者需要强大学习者要使用更多参数。
translated by 谷歌翻译